14 research outputs found

    Exploring emerging technologies for extreme scale HPC architectures

    Get PDF
    While architectures and programming models have remained relatively stable for almost two decades, new architectural features, such as heterogeneous processing, nonvolatile memory, and optical interconnection networks, will demand that software systems and applications be redesigned so that they expose massive amounts of hierarchical parallelism, carefully orchestrate data movement, and balance concerns over performance, power, and resiliency. This instability has led to two inevitable problems: decreased programmer productivity, and difficult performance prediction. In this talk, I will describe two solutions to these problems, respectively: our OpenARC compiler and runtime system, and our Aspen performance modeling language. First, OpenARC is a research compiler that supports OpenACC and OpenMP4, and can generate code in CUDA, OpenCL, and LLVM IR. OpenARC has enabled us to explore how to enable performance portability of applications across diverse architectures. Second, Aspen is a domain specific language for structured analytical modeling of applications and architectures. It is designed to enable rapid exploration of new algorithm and architectures. Once created, Aspen models can then be used for a variety of purposes including predicting performance of future applications, evaluating system architectures, informing runtime scheduling decisions, and identifying system anomalies

    Characterization of scientific workloads on systems with multi-core processors

    No full text
    Abstract. Multi-core processors are planned for virtually all next-generation HPC systems. In a preliminary evaluation of AMD Opteron Dual-Core processor systems, we investigated the scaling behavior of a set of micro-benchmarks, kernels, and applications. In addition, we evaluated a number of processor affinity techniques for managing memory placement on these multi-core systems. We discovered that an appropriate selection of MPI task and memory placement schemes can result in over 25 % performance improvement for key scientific calculations. We collected detailed performance data for several large-scale scientific applications. Analyses of the application performance results confirmed our micro-benchmark and scaling results. Keywords: Performance characterization, Multi-core processor, AMD Opteron, micro-benchmarking, scientific applications
    corecore